Int. J. Exp. Res. Rev., Vol. 39: 190-199 (2024) 
Open Access 


International Journal of Experimental Research and Review (IJERR) MMT T| 
© Copyright by International Academic Publishing House ([APH) 

ISSN: 2455-4855 (Online) 

www. iaph.in 9 


DOI: https://doi.org/10.52756/ijerr.2024.v39spl.015 
Original Article 


Peer Reviewed 


72455°485008 


Providing Highest Privacy Preservation Scenario for Achieving Privacy in Confidential Data 


® Check for updates 


Pinkal Jain*, Vikas Thada and Deepak Motwani 


Department of Computer Science & Engineering, Amity University Gwalior -474001, Madhya Pradesh, India 


E-mail/Orcid Id: 


PJ, @ pinku029jain @ gmail.com, © https://orcid.org/0000-0001-8002-320X; VT, ® vthada@ gwa.amity.edu, © https://orcid.org/0000-0002-8 131-9616; 
DM, © dmotwani@gwa.amity.edu, © https://orcid.org/0000-0002-02 17-7155 


Article History: 


Received: 24" Dec., 2023 
Accepted: 25" May, 2024 
Published: 30" May, 2023 


Abstract: Machine learning algorithms have been extensively employed in multiple 
domains, presenting an opportunity to enable privacy. However, their effectiveness 
is dependent on enormous data volumes and high computational resources, usually 
available online. It entails personal and private data like mobile telephone numbers, 
identification numbers, and medical histories. Developing efficient and economical 
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techniques to protect this private data is critical. In this context, the current research 
suggests a novel way to accomplish this, combining modified differential privacy 
with a more complicated machine learning (ML) model. It is possible to assess the 
privacy-specific characteristics of single or multiple-level models using the 
suggested method, as demonstrated by this work. It then employs the gradient values 
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noise, thereby preserving sensitive information within the data. The experimental 
results show that by fine-tuning the parameters of the modified differential privacy 
model based on the varied degrees of private information in the data, our suggested 
model outperforms existing methods in terms of accuracy, efficiency and privacy. 


Introduction 


IoT devices are data-driven, and the world should Beyond legal methods to avoid information leaks, 


concentrate more on safeguarding data than anything else. 
The cybersecurity law developed in 2017 contains a 
Stipulation regarding personal privacy protection, 
including the personal information of network operators. 
The illegal use of sensitive information, i.e., personal 
information, is prohibited by law (Jain et al., 2023). 
Furthermore, in 2018, the European Union issued 
substantial directives governing how businesses handle 
personal data. These principles require the ethical 
treatment of individual information, creating trust and 
responsibility in data management procedures, and 
making it illegal for business models to gather, exchange, 
or analyze data without the user's permission (Abadi et al., 


2016; Jain et al., 2023). 
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effective privacy protection in ML needs the unique 
properties of ML itself (Bettini and Riboni, 2015; Mondal 
et al., 2023). It necessitates building model structures and 
training procedures with privacy protection as a top 
priority, guaranteeing that sensitive personal information 
is inaccessible to unauthorized parties throughout the 
learning process. 

Traditional machine learning techniques have a 
centralized method, with data collectors gathering 
information from numerous sources before being 
examined by data specialists (Feng et al., 2019; Samadder 
et al., 2023). This method is known as centralized learning 
(Fig. 1). First, In the centralized learning paradigm, after 
collecting data, users can hardly have control over the data 
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and don’t know how the data will be used or where it will 
2020). Second, in the modern 
environment, scholars have tried computing global models 


be used (Gupta et al., 


using localized data. For instance, federated learning by 
Google has been in use since 2017. Despite giving users 
partial power over their private data, this definition does 
not enable users to mitigate privacy vulnerabilities fully 
(Owusu et al., 2021). 

At the same time, privacy protection in machine 
learning is ensured using differential privacy algorithms 
and their diverse modifications. Differential privacy is 
improved by researchers from three primary perspectives: 
differential privacy based on gradients, function-based 
differential privacy, and label-based differential privacy 
(Truex et al., 2019; Kumar et al., 2023). In all cases, 
differential privacy is based on a shared goal, which is to 
add specific noise in diverse ways and directions when 
zeroing in on the machine learning process (Pei et al., 
2022; Pal et al., 2023). 
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Majorly, few authors have creatively developed 
ADLM, a new differential privacy immune mechanism. 
During the training, ADLM dynamically adjusts the noise 
level by boosting the noise in insufficiently correlated 
neurons. Therefore, while this modification led to a 
striking improvement in model accuracy, it reduced the 
accuracy value by reporting 84.8 percent performance in 
the CIFAR-10 dataset (Jain et al., 2022). Furthermore, few 
authors implement a novel deep learning approach for 
semi-supervised learning using knowledge transfer 
techniques (Claerhout et al., 2005). To achieve high model 
accuracy and other robust protections for privacy, train 
several teacher models on different data sets to predict 
their deployment. They add noise to the student model 
while training, and the student model with high accuracy 
needs an accurate teacher model (Bu et al., 2021). 

Although 
algorithms developed to ensure high levels of data privacy, 


there are many _ privacy-preserving 


adding noise may actually lead to a decrease in a model’s 
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Figure 1. Centralized Learning Process: A Central Server Collects Data 
from Numerous Sources. 


For example, Abadi and colleagues used the method to 
define the connection between the relationships and 
stabilize the gradient descent amplifications to maintain 
privacy. The challenge associated with the method was its 
inability to concentrate on complex models. In 
addition, the optimization of the DP-GAN method adds 
noise-protected data to the gradient calculation through the 
Wasserstein distance (Shokri et al., 2015). Despite the 
current use of the generators to improve the quality of the 
training data, the approach has effectiveness issues on 
complex datasets. Moreover, Jain and coworkers further 
used privacy methods, including the new layer of privacy 
reporting and the gradient descent-based global sensitivity 
computing layer. In addition, the addition of the network 
layer, which had limitations on complex networks (Wang 
et al., 2020; Yadav and Singh, 2023). 
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ability to fit data with high accuracy (Miller et al., 2009). 
Such a trade-off between data authenticity and privacy 
security can considerably decrease the performance of 
machine learning models based on the classifications. 
Recently, advanced intelligent data or pattern recognition 
technologies, especially deep learning, have become 
drastically popular. Advanced intelligent data recognition 
technologies, particularly deep learning, have attracted 
significant attention (Zheng et al., 2017). This facilitates 
improved prediction accuracy in differential privacy 
models. Given these insights, this article proposes a novel 
approach to safeguarding privacy through the integration 
of differential privacy with convolutional networks. In 
addition to enhancing privacy protection, this method 
improves data availability; it still protects sensitive 
information in the datasets. In addition, it can restore 


Int. J. Exp. Res. Rev., Vol. 39: 190-199 (2024) 


training in very small sample sizes for large sample sizes 
by a multi-factor of ORd4, which lowers the success of the 
attack. These are the only types of attacks available prior 
to our study and are only usable via equation-solving 
methods, most effective on a simple linear binary model 
(Wang et al., 2020). 

Therefore, the paper contains a literature review in 
Section 2, a comprehensive presentation of our algorithm 
in Section 3, methodology in Section 4, results and 
discussion in Section 5, and a conclusion in Section 6. 


Literature Review 

Following are the privacy challenges and how machine 
learning methodologies are applied to mitigate them: 
Navigating Privacy Challenges 

The quick evolution of data processing has raised 
security fears regarding sensitive information from many 
quarters. In the domain of machine learning, confidential 
data breaches commonly appear in two ways: 

Direct Privacy Disclosure 

These stem from extensive data collection practices by 
untrustworthy data collectors who acquire personal data 
and share or trade data without individuals' consent (Zhu 
et al., 2020). 

Indirect Privacy Disclosure 

This arises from the inadequate generalization ability 
of machine learning models. In a significant advancement, 
they developed ADLM, a novel mechanism for 
differential privacy protection (Jain et al., 2023). 

The adjustment mentioned above uniquely boosted the 
accuracy, which reached an outstanding 84.8% when 
applied to the CIFAR-10 dataset. Furthermore, there is a 
deep learning mechanism that uses knowledge distillation- 
based techniques. They designed a novel approach for 
training deep learning models by training multiple teacher 
models with different datasets and combining their 
predictions to introduce noise while training the model. 
Not only does this approach guarantee high model 
accuracy, but it also guarantees strong privacy protection. 
However, having accurate student-teacher models requires 
highly accurate teachers, who need data to train the model. 
However, they developed a novel mechanism for 
differential values called ADLM (Pei et al., 2022). This 
compromise between data authenticity and privacy 
security could ultimately harm the classification accuracy 
of a machine-learning model. 

Advanced intelligent data recognition technologies, 
notably deep learning, have sparked great interest and 
allow for enhanced prediction accuracy in differential 
privacy models. Given these findings, this study presents 
a novel technique for protecting privacy by combining 
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differential privacy with convolutional networks. This 
technique not only improves the accuracy of data but also 
increases its availability (Feng et al., 2019). It can 
reconstitute training with small sample numbers while 
reducing the efficacy of attacks with large samples. Early 
model theft attacks are generally based on equation- 
solving algorithms (Kairouz et al., 2019). 
Reconstruction Attack 

Adversaries attempt to recreate sensitive details or a 
specific model for individuals from training datasets. 
These initiatives involve techniques, including model 
inversion attacks and model theft (Gupta et al., 2020). 
Model inversion attacks attempt to extract sensitive 
information about people via dynamic analysis or 
similarity evaluations. The model uses data to strengthen 
defences against such breaches, using confidence 
algorithms to detect built-in virtual profiles to disclose 
genuine data. Model-stealing attacks use early methods 
based on equation-solving techniques, but they can be 
expanded to complicated models 
confidence (Arachchige et al., 2019). 
Member Inference Attack (MIA) 


Attackers seek to check whether a given sample 


with predictive 


correlates with the training dataset. Such inference can 
have serious repercussions, such as diagnostic models 
created with sensitive medical data (Yuan et al., 2013). In 
this case, the attacks are primarily motivated by the 
similarity between data distribution and model structures. 

In addition to the privacy preservation risks, machine 
learning suffers from several security challenges. Unlike 
privacy issues, which can lead to data leaks, security flaws 
can jeopardize the operation and accuracy of machine 
learning models (Jain et al., 2022). Poisoning and anti- 
sample attacks are two security concerns that might occur 
during the model training and application stages. 
Machine Learning for Privacy Preservation 

Privacy preservation scenarios are responsible for 
privacy disclosure, require suitable methods to protect 
privacy, and need to consider some scenarios to obtain an 
approach. These two factors play a vital role in executing 
these approaches: the first is reliability, which depends on 
the distribution of training data, and the second is that the 
model outperforms noise (Zhu et al., 2020; Malin et al., 
2004). 
Machine Learning Techniques 

Machine learning techniques include supervised, 
unsupervised, 
approaches 


and reinforcement learning. Training 
include centralized, distributed, and 
collaborative learning models. Each approach handles 
training datasets 


differently and influences privacy 


concerns (Bonawitz et al., 2019). 
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Classification of Privacy Protection Technologies 

Typical privacy protection methods include DP 
(Differential Privacy), HE (Homomorphic Encryption), 
and SMC (Secure Multiparty Computing). Depending on 
different levels of data privacy we have to ensure a high 
level of security (Zhang et al., 2020). 

To sum up, it takes consistent efforts on multiple 
fronts—regulation, building a better model, and utilizing 
various PPT research—to address privacy concerns in 
machine learning (Cui et al., 2019). 


Proposed Work 

This section provides the details of the proposed work 
and related definitions. The proposed work incorporates 
the properties of the modified differential privacy 
technique and Gaussian distributions. It then determines 
the privacy of each layer in the neural network. It then uses 
the gradient values from stochastic gradient descent to 
calculate the amount of Gaussian noise, preserving 
sensitive data. 


Semi- 
supervised 
learning 


Generate Models 


Limit of learning 
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Modeltype 
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privacy guarantee while 6 is the additive error. To put it in 
simple terms, this means that for any two training data 
sets, i.e., D and D’, the latter having just one record at most 
different from the former, A produces results complying 
with some criterion. The formula for this is Pr[(A(D’)) = 
R)] <( e*(e))* Pr[(A(D) = R)] + &, where algorithm A 
satisfies (¢, 5), i.e., differential privacy. If ¢ is smaller, then 
it indicates better privacy protection. 
How Algorithms Work 

There are two main challenges to deploying the (¢, 5) 
DPP (Differential Privacy Preservation) technique: 

1. Select where to insert the noise. 

2. Effective deployment of resources 

The proposed technique addresses these difficulties by 
incorporating different levels of privacy into neural 
network training. 


Methodology 

The proposed method effectively integrates the 
properties of adjusted DP (differential privacy) and 
Gaussian accumulations. This makes it possible to 


K-means 


Unsupervised 
learning 


Logistic regression 
Random forests 


Bayes algorithm 


Figure 2. Model Generation Process Using Machine Learning. 


Corresponding Terminology 

Microsoft announced differential privacy in 2006. It 
creates a rigorous mathematical framework for analyzing 
privacy. Privacy can be achieved by adding noise to the 
original data while maintaining its integrity. 

Two remarkable things about it are that it is indifferent 
to any particular concept of an attacker and concerned only 
with data privacy. What is formally called differential 
privacy is (s, 6)-differential privacy, where ¢ is the level of 
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determine the explicit privacy budgets of every layer 
within a neural network model. It uses the gradient values 
from the SGD (Stochastic Gradient Descent) algorithm to 
check how much Gaussian noise should be added. 
Outcomes show that by fine-tuning the value of 
parameters in the modified DPM (Differential Privacy 
Model), our suggested model is good in terms of accuracy, 
efficiency, and privacy. 

The (s, 5)-DPP (Differential Privacy Protection) 
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technique is implemented and tested for experimental 
purposes. Results 
capabilities of the algorithm while ensuring that data 


focus on the privacy-preserving 
utility is also not compromised. The evaluation would also 
involve privacy preservation and model availability to 
ascertain how well the algorithm performed under real- 
world conditions. This work aims to evaluate the 
performance of DCGAN (Deep Convolutional Generative 
Adversarial Network). 

To achieve these goals, the proposed algorithm 
generates synthetic data using DCGAN and compares it 
with the original dataset to check for closeness. When the 
similarity is above a certain predetermined threshold, we 
need to fine-tune the model to align perfectly with these 
criteria. 

Through experiments, we strive to prove that our 
algorithm is able to ensure the privacy-utility trade-off. 
Finally, the performance of the algorithm is measured 
through the preservation of privacy and the availability of 
the model. 

For our experiments, we used an Intel (R), Xeon (R), 
CPU E5-2603 V3 @ 1.60 GHz with 8 GB of memory. In 
addition, the system comprises two Titan X GPUs and is 
based on the Ubuntu 16.04. For all of our experiments, we 
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to 60 dimensions. The Bork assesses to measure the 
effectiveness and efficiency of functions provided 
different privacy constraint ¢ along with allowable limit 
bias, i.e., 6, which spends much on variance scale, i.e., . 
Experimental work 

For o = 8 (Figure 4), our algorithm performs poorly 
against the training set and test set, i.e., larger noise scales 
should undermine training quality while still maintaining 
the privacy of the testing dataset (Jain et al., 2023). 
Table 1. Classification Accuracy and Success Rate on 
Different Size of Data. 


Size of Epoch Accuracy % Attack 
Data success % 
10000 10 98.52 12.14 
20000 10 92.34 20.74 
30000 10 90.42 26.05 
10000 25 85.18 33.54 
20000 25 56.75 46.98 
30000 25 52.25 41.25 
10000 50 48.74 59.56 
20000 50 22.89 79.52 


Generate data 


For o set to 4 (Figure 5), the algorithm's performance 
diminished over time, indicating improved balance. 
The most consistent results were produced with o = 2, 


Discriminators 


Extraction of image 
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Figure 3. Methodology of the Proposed Work. 


used Python with the TensorFlow 1.0 framework, built 
using Bazel 0.3.1. The MNIST dataset used for the 
experiment contains 60,000 training and 10000 testing 
samples. 


Results & Discussions 
Dataset Used 
For experimental purposes, we consider the MNIST 
dataset with C = 4(Gradient threshold) and PCA reduced 
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and the model’s performance was observed. We also 
conducted single-sample label inference attacks to assess 
the robustness of this kind of attack. The results indicated 
a significant (p <0.05) negative correlation between the 
success rate and accuracy of the model classification. 
Overfitting decreased the model’s ability to generalize and 
how well it defended against inference. 
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As expected, increasing the number of training samples Comparison between the Proposed and Existing 

and epochs led to higher overfitting, reduced classification Systems 
accuracy, and an_ increased inference attack The proposed technique is compared with the existing 
one. The difference between the proposed model and a 
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Figure 4. Outcomes at Variance o=8 & Level of Privacy Guarantee e=0.5. 
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Figure 5. Outcomes at Variance o=4 & Level of Privacy Guarantee e=0.5. 
success rate. However, we take 10,000 samples for model CNN model based on classification accuracy and 
training and perform 10 epochs. The trained model gives _ inference attacks is shown in Table 2. 
an effective classification accuracy of 98.75% and a 
13.14% inference attack success rate. 
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different number of epochs 
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Figure 6. Accuracy And Attack Success Rate on Different Number of Epochs. 


Table 2. Comparison of Classification Accuracy between Proposed and Existing Works. 
Success Rate 


Proposed Model Success Rate Existing (CNN) of CNN 
Size Data Total Epoch accuracy (%) of Attacks (%) (%) Attacks (%) 
10000 10 98.97 10.75 95.38 95.25 
20000 10 98.52 12.16 95.12 87.56 
30000 10 97.91 12.33 93.65 66.12 
10000 25 97.59 13.75 94.74 78.15 
20000 25 99.02 11.94 95.78 64.22 
30000 25 96.78 12.12 94.77 60.57 
10000 50 97.36 12.28 94.29 65.72 
20000 50 98.22 10.74 93.15 59.87 
30000 50 99.02 11.72 96.46 53.85 


Table 2 shows the comparison of classification 
accuracy and inference attacks between the proposed work 
and a CNN-based work (Pei et al., 2022). The proposed 
work outperformed CNN in terms of classification 
accuracy and defence against attacks. With 50 training 
rounds for CNN, its attack accuracy and classification 
accuracy decreased, indicating overfitting. A comparison 
between proposed and existing techniques is shown in 
Figure 7. 


Conclusion and Future Work 

This paper tries to tackle the privacy issue, i.e., is it 
possible to get privacy in machine learning tasks without 
losing classification accuracy? The proposed approach 
addresses this issue by blending tailored differential 
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privacy methods with deep learning, which effectively 
preserves private information in the training data. During 
the process of parameter optimization for a network 
model, we inject noise data into an entirely modified DP 
framework. Our experimental results show that there is a 
trade-off between the accessibility of DNN training 
datasets and privacy leakage. This method guarantees high 
classification accuracy without revealing too much 
information. Such an approach can form a solid foundation 
for concepts of privacy regarding users and machine 
learning problems at scale. Collectively, they offer 
significantly more granular control over their own data. 
Going forward, we are going to focus on the efficiency and 
robustness of the technique. 
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